Action | Keyboard Shortcut |
---|---|
Open All Codes and Outputs in a Note | Ctrl + 🠙 |
Close All Codes and Outputs in a Note | Ctrl + 🠛 |
Close Any Popped Up Window (like this one) | Esc |
Previous Note | Ctrl + 🠘 |
Next Note | Ctrl + 🠚 |
List of Sections in a Note | Ctrl + x |
List of Keyboard Shortcuts | Ctrl + z |
Goal: To plot world heatmaps in Julia to visually represent country statistics. The approach handles all computations in Julia through DataFrames
, and then executes a single command in R from within Julia (through RCall
). I also provide a cleaned dataset containing map coordinates, which is ready to be used and identifies countries by their ISO codes.
Relevance of the Note: I was looking for a simple way to draw world heatmaps in Julia. While existing Julia packages such as GMT.jl offer more advanced functionalities, they may be overkill for simple uses. Furthermore, standard datasets for map coordinates in R usually require further cleaning before they're ready to be used. To address these issues, I provide a straightforward approach if you want to handle the data and computations through Julia's DataFrames.
Representing each country's share of the world income (fake data), the final outcome would be as follows
Links to Download the Files of this Note:
The Julia code is here under the name allCode.jl
. If you wish to run the code with the same package versions used in this post, download instead allCode_withPkg.jl
along with Project.toml
and Manifest.toml
.
The following script lists all the packages in Julia and R for plotting heatmaps. Keep in mind that any code written in Julia between R"""
and """
is interpreted as R code, which is an application of the package RCall
. [note] RCall
occasionally fails to recognize R's path after installing it. If this occurs, try specifying the path to R on your computer and then building the package. Each of these steps on my computer by running ENV["R_HOME"]="C:\\Program Files\\R\\R-4.3.0"
and Pkg.build("RCall")
.
using DataFrames, RCall, CSV, Downloads
R"""
library(ggplot2)
library(svglite) #to save graphs in svg format, otherwise not necessary
"""
our_df
. The code for creating the DataFrame is omitted, as the specifics of how the mock data were generated are unimportant for our purposes. What matters is that, after cleaning our data and computing statistics, our_df
will have the following variables.
our_df[1:5,:]
5×2 DataFrame
Row │ iso3 share_income
│ String Float64
─────┼──────────────────────
1 │ ABW 0.694218
2 │ AFG 0.48446
3 │ AGO 0.450032
4 │ AIA 0.169835
5 │ ALB 0.487509
Now, we proceed to plot a world heatmap that represents the variable share_income
.
To plot each country's share of the world's income, we need map coordinates. The ggplot2
package in R can provide this information by executing map_data("world")
, which internally calls the maps
package. However, this dataset demands further cleaning before it can be used for world maps. [note] For instance, the dataset doesn't include each country's ISO code by default. Adding this information requires performing a matching using regular expressions, but the method is not guaranteed to work, as the documentation indicates. Building on these data, I provide a cleaned dataset for map coordinates at this link. The dataset also includes some variables that identify zones, enabling you to restrict the plot to specific regions.
The dataset with map coordinates can be merged with our DataFrame by running the snippet code below. This creates a new variable called merged_df
, which is the DataFrame we'll use to plot the heatmap.
# we merge our dataframe with the file having map coordinates
link = Downloads.download("https://alfaromartino.github.io/data/countries_mapCoordinates.csv")
df_coordinates = DataFrame(CSV.File(link)) |> x-> dropmissing(x, :iso3)
merged_df = leftjoin(df_coordinates, our_df, on=:iso3)
merged_df
4×11 DataFrame
Row │ long lat group order short_name_country iso2 iso3 iso_name zone subzone share_income
│ Float64 Float64 Float64 Int64 String String3? String3 String? String15? String15? Float64?
─────┼───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ -65.4285 43.5614 260.0 15259 Canada CA CAN Canada North America North America 0.13938
2 │ 74.6131 40.2722 430.0 30429 China CN CHN China Asia Asia 2.7678
3 │ -60.2417 5.25796 667.0 45666 Guyana GY GUY Guyana Latin America South America 0.240533
4 │ -87.7618 18.4461 985.0 61077 Mexico MX MEX Mexico Latin America Central America 0.559792
Specifically, the map coordinates are represented through four variables: long
, lat
, group
, and order
. These variables and iso3
are all we need to identify and draw a world map. Additionally, share_income
represents our statistic of interest.
Once we've computed our statistics in Julia and identified countries by their ISO codes, the final step is to use R to create the heatmap. To accomplish this, we must first define a file path in Julia, where our graphs will be stored. If you run the code I provided, the folder will be located at C:\maps\graph_folder
.
isdir(joinpath("C:/", "maps")) || mkdir(joinpath("C:/", "maps")) #create 'maps' folder if it doesn't exist
graphs_folder = joinpath("C:/", "maps")
To use R from within Julia, we need to convert the DataFrame into R's format. This can be done in two ways. The first option is to directly send merged_df
to R, which will create an R's DataFrame in R with the same name. However, this option is only recommended if you plan to further manipulate the data in R, which is not our case. The second option is to simply interpolate merged_df
when we reference it in R.
Below, we provide the implementation of both options. After this, we'll exclusively use the second option, as it's simpler.
# we create the graph using RCall
R"""
#baseline code
our_gg <- ggplot() + geom_polygon(data = $(merged_df),
aes(x=long, y = lat, group = group,
fill=share_income))
#saving the graph
ggsave(filename = file.path($(graphs_folder),"file_graph0.svg"), plot = our_gg)
"""
@rput merged_df # we send our merged dataframe to R
# we create the graph using RCall
R"""
#baseline code
our_gg <- ggplot() + geom_polygon(data = merged_df,
aes(x=long, y = lat, group = group,
fill=share_income))
#saving the graph
ggsave(filename = file.path($(graphs_folder),"file_graph0.svg"), plot = our_gg)
"""
We refer to this graph as the baseline case, since it uses the default layout provided by ggplot2
. The difference between the approaches can be seen through the use of @rput
or the interpolation of merged_df
. In both options, though, graphs_folder
is directly interpolated, indicating to R where to store the graphs (we could've alternatively used @rput
to send the variable graphs_folder
to R).
The fill
option constitutes the only information you need to fill in if you use this code as a template. It indicates the variable to be plotted, which is share_income
in our case. The rest of the code should remain the same, regardless of what you wish to plot. In particular, aes(x=long, y = lat, group = group)
must be specified in the same way every time you use it, as it provides the necessary information for drawing a world map.
And that's it! You can use the code snippet as a template for your own graphs. The resulting graph is shown below.
As the baseline graph may not be visually appealing, we next provide further options to enhance its appearance. The final code we'll provide can also be used as a template that can be "copied and pasted", similar to the baseline script.
For each option added to the graph, we state the corresponding code and the resulting plot. We only change one option at a time, and you can skip the rest of this section if you wish to apply the code off the shelf. This code includes all the options we describe below, and produces the graph shown at the beginning of this note.
The codes and graphs are hidden by default. Press on "Code" to see the code, and on "Result" for the graph. Alternatively, press Ctrl + 🠛 and Ctrl + 🠙 to open and close all codes and graphs simultaneously.
merged_df2 = merged_df[.!(occursin.(r"Antarct", merged_df.short_name_country)),:]
R"""
#base code
our_gg <- ggplot() + geom_polygon(data = $(merged_df2),
aes(x=long, y = lat, group = group, fill=share_income))
#saving the graph
ggsave(filename = file.path($(graphs_folder),"file_graph01a.svg"), plot = our_gg)
"""
We first define the following function using RCall
:
R"""
user_theme <- function(){
theme(
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank()
)
}
"""
And then plot the graph
R"""
#base code
our_gg <- ggplot() + geom_polygon(data = $(merged_df),
aes(x=long, y = lat, group = group, fill=share_income))
#we get rid of background, grid, and latitude/longitude
our_gg <- our_gg + user_theme()
#saving the graph
ggsave(filename = file.path($(graphs_folder),"file_graph01b.svg"), plot = our_gg)
"""
R"""
#base code
our_gg <- ggplot() + geom_polygon(data = $(merged_df),
aes(x=long, y = lat, group = group, fill=share_income))
#changing aspect ratio
our_gg <- our_gg + coord_fixed(1.3)
#saving the graph
ggsave(filename = file.path($(graphs_folder),"file_graph01c.svg"), plot = our_gg)
"""
A quirky feature of ggplot2
is that lighter colors are associated with higher values. The following code inverts the color scale, so that darker colors correspond to higher values.
R"""
#base code
our_gg <- ggplot() + geom_polygon(data = $(merged_df),
aes(x=long, y = lat, group = group, fill=share_income))
#to change color scale
# see colors in R here https://derekogle.com/NCGraphing/resources/colors
our_gg <- our_gg + scale_fill_gradient(low="lightskyblue", high="steelblue4", name = "Income Share")
#additional options for the legend of color scale
our_gg <- our_gg + theme(
legend.position = "bottom",
legend.key.height = unit(0.5, "cm"),
legend.key.width = unit(2, "cm"))
#saving the graph
ggsave(filename = file.path($(graphs_folder),"file_graph01d.svg"), plot = our_gg)
"""
R"""
#base code + borders
our_gg <- ggplot() + geom_polygon(data = $(merged_df),
aes(x=long, y = lat, group = group, fill=share_income),
color = "white", linewidth=0.1)
#saving the graph
ggsave(filename = file.path($(graphs_folder),"file_graph01e.svg"), plot = our_gg)
"""
R"""
#base code
our_gg <- ggplot() + geom_polygon(data = $(merged_df),
aes(x=long, y = lat, group = group, fill=share_income))
#saving the graph
height <- 5
ggsave(filename = file.path($(graphs_folder),"file_graph01f.svg"), plot = our_gg, width = height * 3, height = height)
"""
You can use the following code off the shelf to plot a world heatmap. First, execute the following lines in Julia, without making any changes.
R"""
user_theme <- function(){
theme(
panel.background = element_blank(),
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_blank(),
axis.text.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_blank()
)
}
"""
Then, execute the script below by only making two modifications: insert the name of your Julia DataFrame at the option data
, and indicate the variable to be plotted at fill
. In our example, these options respectively correspond to merged_df2
(if we do not want to plot Antarctica, otherwise merged_df
) and share_income
.
merged_df2 = merged_df[.!(occursin.(r"Antarct", merged_df.short_name_country)),:]
R"""
#base code
our_gg <- ggplot() + geom_polygon(data = $(merged_df2),
aes(x=long, y = lat, group = group,
fill=share_income),
color = "white", linewidth=0.05)
#we get rid of background, grid, and latitude/longitude
our_gg <- our_gg + user_theme()
#changing aspect ratio
our_gg <- our_gg + coord_fixed(1.3)
#to change color scale (see color codes here https://derekogle.com/NCGraphing/resources/colors)
our_gg <- our_gg + scale_fill_gradient(low="lightskyblue", high="steelblue4", name = "Income Share")
#additional options for the legend of color scale
our_gg <- our_gg + theme(
legend.position = "bottom",
legend.key.height = unit(0.5, "cm"),
legend.key.width = unit(2, "cm"))
#saving the graph
height <- 5
ggsave(filename = file.path($(graphs_folder),"file_graph01.svg"), plot = our_gg, width = height * 3, height = height)
"""